
Chapter 3  Structure of MS-DOS Application Programs

  Programs that run under MS-DOS come in two basic flavors: .COM programs,
  which have a maximum size of approximately 64 KB, and .EXE programs, which
  can be as large as available memory. In Intel 8086 parlance, .COM programs
  fit the tiny model, in which all segment registers contain the same value;
  that is, the code and data are mixed together. In contrast, .EXE programs
  fit the small, medium, or large model, in which the segment registers
  contain different values; that is, the code, data, and stack reside in
  separate segments. .EXE programs can have multiple code and data segments,
  which are respectively addressed by long calls and by manipulation of the
  data segment (DS) register.

  A .COM-type program resides on the disk as an absolute memory image, in a
  file with the extension .COM. The file does not have a header or any other
  internal identifying information. A .EXE program, on the other hand,
  resides on the disk in a special type of file with a unique header, a
  relocation map, a checksum, and other information that is (or can be) used
  by MS-DOS.

  Both .COM and .EXE programs are brought into memory for execution by the
  same mechanism: the EXEC function, which constitutes the MS-DOS loader.
  EXEC can be called with the filename of a program to be loaded by
  COMMAND.COM (the normal MS-DOS command interpreter), by other shells or
  user interfaces, or by another program that was previously loaded by EXEC.
  If there is sufficient free memory in the transient program area, EXEC
  allocates a block of memory to hold the new program, builds the program
  segment prefix (PSP) at its base, and then reads the program into memory
  immediately above the PSP. Finally, EXEC sets up the segment registers and
  the stack and transfers control to the program.

  When it is invoked, EXEC can be given the addresses of additional
  information, such as a command tail, file control blocks, and an
  environment block; if supplied, this information will be passed on to the
  new program. (The exact procedure for using the EXEC function in your own
  programs is discussed, with examples, in Chapter 12.)

  .COM and .EXE programs are often referred to as transient programs. A
  transient program "owns" the memory block it has been allocated and has
  nearly total control of the system's resources while it is executing. When
  the program terminates, either because it is aborted by the operating
  system or because it has completed its work and systematically performed a
  final exit back to MS-DOS, the memory block is then freed (hence the term
  transient) and can be used by the next program in line to be loaded.


The Program Segment Prefix

  A thorough understanding of the program segment prefix is vital to
  successful programming under MS-DOS. It is a reserved area, 256 bytes
  long, that is set up by MS-DOS at the base of the memory block allocated
  to a transient program. The PSP contains some linkages to MS-DOS that can
  be used by the transient program, some information MS-DOS saves for its
  own purposes, and some information MS-DOS passes to the transient
  programto be used or not, as the program requires (Figure 3-1).

  Offset
  0000H ++
        |                        Int 20H                         |
  0002H ++
        |            Segment, end of allocation block            |
  0004H ++
        |                        Reserved                        |
  0005H ++
        |        Long call to MS-DOS function dispatcher         |
  000AH ++
        |        Previous contents of termination handler        |
        |               interrupt vector (Int 22H)               |
  000EH ++
        | Previous contents of Ctrl-C interrupt vector (Int 23H) |
  0012H ++
        |      Previous contents of critical-error handler       |
        |               interrupt vector (Int 24H)               |
  0016H ++
        |                        Reserved                        |
  002CH ++
        |          Segment address of environment block          |
  002EH ++
        |                        Reserved                        |
  005CH ++
        |             Default file control block #1              |
  006CH ++
        |             Default file control block #2              |
        |              (overlaid if FCB #1 opened)               |
  008OH ++
        ++                             |
        ++ ++
        |                        ++
        |  Command tail and default disk transfer area (buffer)  |
  OOFFH ++

  Figure 3-1.  The structure of the program segment prefix.

  In the first versions of MS-DOS, the PSP was designed to be compatible
  with a control area that was built beneath transient programs under
  Digital Research's venerable CP/M operating system, so that programs could
  be ported to MS-DOS without extensive logical changes. Although MS-DOS has
  evolved considerably since those early days, the structure of the PSP is
  still recognizably similar to its CP/M equivalent. For example, offset
  0000H in the PSP contains a linkage to the MS-DOS process-termination
  handler, which cleans up after the program has finished its job and
  performs a final exit. Similarly, offset 0005H in the PSP contains a
  linkage to the MS-DOS function dispatcher, which performs disk operations,
  console input/output, and other such services at the request of the
  transient program. Thus, calls to PSP:0000 and PSP:0005 have the same
  effect as CALL 0000 and CALL 0005 under CP/M. (These linkages are not the
  "approved" means of obtaining these services, however.)

  The word at offset 0002H in the PSP contains the segment address of the
  top of the transient program's allocated memory block. The program can use
  this value to determine whether it should request more memory to do its
  job or whether it has extra memory that it can release for use by other
  processes.

  Offsets 000AH through 0015H in the PSP contain the previous contents of
  the interrupt vectors for the termination, Ctrl-C, and critical-error
  handlers. If the transient program alters these vectors for its own
  purposes, MS-DOS restores the original values saved in the PSP when the
  program terminates.

  The word at PSP offset 002CH holds the segment address of the environment
  block, which contains a series of ASCIIZ strings (sequences of ASCII
  characters terminated by a null, or zero, byte). The environment block is
  inherited from the program that called the EXEC function to load the
  currently executing program. It contains such information as the current
  search path used by COMMAND.COM to find executable programs, the location
  on the disk of COMMAND.COM itself, and the format of the user prompt used
  by COMMAND.COM.

  The command tailthe remainder of the command line that invoked the
  transient program, after the program's nameis copied into the PSP
  starting at offset 0081H. The length of the command tail, not including
  the return character at its end, is placed in the byte at offset 0080H.
  Redirection or piping parameters and their associated filenames do not
  appear in the portion of the command line (the command tail) that is
  passed to the transient program, because redirection is transparent to
  applications.

  To provide compatibility with CP/M, MS-DOS parses the first two parameters
  in the command tail into two default file control blocks (FCBs) at
  PSP:005CH and PSP:006CH, under the assumption that they may be filenames.
  However, if the parameters are filenames that include a path
  specification, only the drive code will be valid in these default FCBs,
  because FCB-type file- and record-access functions do not support
  hierarchical file structures. Although the default FCBs were an aid in
  earlier years, when compatibility with CP/M was more of a concern, they
  are essentially useless in modern MS-DOS application programs that must
  provide full path support. (File control blocks are discussed in detail in
  Chapter 8 and hierarchical file structures are discussed in Chapter 9.)

  The 128-byte area from 0080H through 00FFH in the PSP also serves as the
  default disk transfer area (DTA), which is set by MS-DOS before passing
  control to the transient program. If the program does not explicitly
  change the DTA, any file read or write operations requested with the FCB
  group of function calls automatically use this area as a data buffer. This
  is rarely useful and is another facet of MS-DOS's handling of the PSP that
  is present only for compatibility with CP/M.

  
  WARNING
    Programs must not alter any part of the PSP below offset 005CH.
  


Introduction to .COM Programs

  Programs of the .COM persuasion are stored in disk files that hold an
  absolute image of the machine instructions to be executed. Because the
  files contain no relocation information, they are more compact, and are
  loaded for execution slightly faster, than equivalent .EXE files. Note
  that MS-DOS does not attempt to ascertain whether a .COM file actually
  contains executable code (there is no signature or checksum, as in the
  case of a .EXE file); it simply brings any file with the .COM extension
  into memory and jumps to it.

  Because .COM programs are loaded immediately above the program segment
  prefix and do not have a header that can specify another entry point, they
  must always have an origin of 0100H, which is the length of the PSP.
  Location 0100H must contain an executable instruction. The maximum length
  of a .COM program is 65,536 bytes, minus the length of the PSP (256 bytes)
  and a mandatory word of stack (2 bytes).

  When control is transferred to the .COM program from MS-DOS, all of the
  segment registers point to the PSP (Figure 3-2). The stack pointer
  register contains 0FFFEH if memory allows; otherwise, it is set as high as
  possible in memory minus 2 bytes. (MS-DOS pushes a zero word on the stack
  before entry.)

     SS:SP  ++
            |                                                        |
            |       Stack grows downward from top of segment         |
            |                           |                            |
            |                                                        |
            |                                                        |
            |                           |                            |
            |                 Program code and data                  |
            |                                                        |
  CS:0100H  ++
A           |                 Program segment prefix                 |
  CS:0000H  ++
  DS:0000H
  ES:0000H
  SS:0000H

  Figure 3-2.  A memory image of a typical .COM-type program after loading.
  The contents of the .COM file are brought into memory just above the
  program segment prefix. Program, code, and data are mixed together in the
  same segment, and all segment registers contain the same value.

  Although the size of an executable .COM file can't exceed 64 KB, the
  current versions of MS-DOS allocate all of the transient program area to
  .COM programs when they are loaded. Because many such programs date from
  the early days of MS-DOS and are not necessarily "well-behaved" in their
  approach to memory management, the operating system simply makes the
  worst-case assumption and gives .COM programs everything that is
  available. If a .COM program wants to use the EXEC function to invoke
  another process, it must first shrink down its memory allocation to the
  minimum memory it needs in order to continue, taking care to protect its
  stack. (This is discussed in more detail in Chapter 12.)

  When a .COM program finishes executing, it can return control to MS-DOS by
  several means. The preferred method is Int 21H Function 4CH, which allows
  the program to pass a return code back to the program, shell, or batch
  file that invoked it. However, if the program is running under MS-DOS
  version 1, it must exit by means of Int 20H, Int 21H Function 0, or a
  NEAR RETURN. (Because a word of zero was pushed onto the stack at entry, a
  NEAR RETURN causes a transfer to PSP:0000, which contains an Int 20H
  instruction.)

  A .COM-type application can be linked together from many separate object
  modules. All of the modules must use the same code-segment name and class
  name, and the module with the entry point at offset 0100H within the
  segment must be linked first. In addition, all of the procedures within a
  .COM program should have the NEAR attribute, because all executable code
  resides in one segment.

  When linking a .COM program, the linker will display the message

  Warning: no stack segment

  This message can be ignored. The linker output is a .EXE file, which must
  be converted into a .COM file with the MS-DOS EXE2BIN utility before
  execution. You can then delete the .EXE file. (An example of this process
  is provided in Chapter 4.)

An Example .COM Program

  The HELLO.COM program listed in Figure 3-3 demonstrates the structure of
  a simple assembly-language program that is destined to become a .COM file.
  (You may find it helpful to compare this listing with the HELLO.EXE
  program later in this chapter.) Because this program is so short and
  simple, a relatively high proportion of the source code is actually
  assembler directives that do not result in any executable code.

  The NAME statement simply provides a module name for use during the
  linkage process. This aids understanding of the map that the linker
  produces. In MASM versions 5.0 and later, the module name is always the
  same as the filename, and the NAME statement is ignored.

  The PAGE command, when used with two operands, as in line 2, defines the
  length and width of the page. These default respectively to 66 lines and
  80 characters. If you use the PAGE command without any operands, a
  formfeed is sent to the printer and a heading is printed. In larger
  programs, use the PAGE command liberally to place each of your subroutines
  on separate pages for easy reading.

  The TITLE command, in line 3, specifies the text string (limited to 60
  characters) that is to be printed at the upper left corner of each page.
  The TITLE command is optional and cannot be used more than once in each
  assembly-language source file.

  
   1:          name    hello
   2:          page    55,132
   3:          title   HELLO.COM--print hello on terminal
   4:
   5:  ;
   6:  ; HELLO.COM:    demonstrates various components
   7:  ;               of a functional .COM-type assembly-
   8:  ;               language program, and an MS-DOS
   9:  ;               function call.
  10:  ;
  11:  ; Ray Duncan, May 1988
  12:  ;
  13:
  14:  stdin   equ     0               ; standard input handle
  15:  stdout  equ     1               ; standard output handle
  16:  stderr  equ     2               ; standard error handle
  17:
  18:  cr      equ     0dh             ; ASCII carriage return
  19:  lf      equ     0ah             ; ASCII linefeed
  20:
  21:
  22:  _TEXT   segment word public 'CODE'
  23:
  24:          org     100h            ; .COM files always have
  25:                                  ; an origin of 100h
  26:
  27:          assume  cs:_TEXT,ds:_TEXT,es:_TEXT,ss:_TEXT
  28:
  29:  print   proc    near            ; entry point from MS-DOS
  30:
  31:          mov     ah,40h          ; function 40h = write
  32:          mov     bx,stdout       ; handle for standard output
  33:          mov     cx,msg_len      ; length of message
  34:          mov     dx,offset msg   ; address of message
  35:          int     21h             ; transfer to MS-DOS
  36:
  37:          mov     ax,4c00h        ; exit, return code = 0
  38:          int     21h             ; transfer to MS-DOS
  39:
  40:  print   endp
  41:
  42:
  43:  msg     db      cr,lf           ; message to display
  44:          db      'Hello World!',cr,lf
  45:
  46:  msg_len equ     $-msg           ; length of message
  47:
  48:
  49:  _TEXT   ends
  50:
  51:          end     print           ; defines entry point
  

  Figure 3-3.  The HELLO.COM program listing.

  Dropping down past a few comments and EQU statements, we come to a
  declaration of a code segment that begins in line 22 with a SEGMENT
  command and ends in line 49 with an ENDS command. The label in the
  leftmost field of line 22 gives the code segment the name _TEXT. The
  operand fields at the right end of the line give the segment the
  attributes WORD, PUBLIC, and `CODE'. (You might find it helpful to read
  the Microsoft Macro Assembler manual for detailed explanations of each
  possible segment attribute.)

  Because this program is going to be converted into a .COM file, all of its
  executable code and data areas must lie within one code segment. The
  program must also have its origin at offset 0100H (immediately above the
  program segment prefix), which is taken care of by the ORG statement
  in line 24.

  Following the ORG instruction, we encounter an ASSUME statement on line
  27. The concept of ASSUME often baffles new assembly-language programmers.
  In a way, ASSUME doesn't "do" anything; it simply tells the assembler
  which segment registers you are going to use to point to the various
  segments of your program, so that the assembler can provide segment
  overrides when they are necessary. It's important to notice that the
  ASSUME statement doesn't take care of loading the segment registers with
  the proper values; it merely notifies the assembler of your intent to do
  that within the program. (Remember that, in the case of a .COM program,
  MS-DOS initializes all the segment registers before entry to point to the
  PSP.)

  Within the code segment, we come to another type of block declaration that
  begins with the PROC command on line 29 and closes with ENDP on line 40.
  These two instructions declare the beginning and end of a procedure, a
  block of executable code that performs a single distinct function. The
  label in the leftmost field of the PROC statement (in this case, print)
  gives the procedure a name. The operand field gives it an attribute. If
  the procedure carries the NEAR attribute, only other code in the same
  segment can call it, whereas if it carries the FAR attribute, code located
  anywhere in the CPU's memory-addressing space can call it. In .COM
  programs, all procedures carry the NEAR attribute.

  For the purposes of this example program, I have kept the print procedure
  ridiculously simple. It calls MS-DOS Int 21H Function 40H to send the
  message Hello World! to the video screen, and calls Int 21H Function 4CH
  to terminate the program.

  The END statement in line 51 tells the assembler that it has reached the
  end of the source file and also specifies the entry point for the program.
  If the entry point is not a label located at offset 0100H, the .EXE file
  resulting from the assembly and linkage of this source program cannot be
  converted into a .COM file.


Introduction to .EXE Programs

  We have just discussed a program that was written in such a way that it
  could be assembled into a .COM file. Such a program is simple in
  structure, so a programmer who needs to put together this kind of quick
  utility can concentrate on the program logic and do a minimum amount of
  worrying about control of the assembler. However, .COM-type programs have
  some definite disadvantages, and so most serious assembly-language efforts
  for MS-DOS are written to be converted into .EXE files.

  Although .COM programs are effectively restricted to a total size of 64 KB
  for machine code, data, and stack combined, .EXE programs can be
  practically unlimited in size (up to the limit of the computer's available
  memory). .EXE programs also place the code, data, and stack in separate
  parts of the file. Although the normal MS-DOS program loader does not take
  advantage of this feature of .EXE files, the ability to load different
  parts of large programs into several separate memory fragments, as well as
  the opportunity to designate a "pure" code portion of your program that
  can be shared by several tasks, is very significant in multitasking
  environments such as Microsoft Windows.

  The MS-DOS loader always brings a .EXE program into memory immediately
  above the program segment prefix, although the order of the code, data,
  and stack segments may vary (Figure 3-4). The .EXE file has a header, or
  block of control information, with a characteristic format (Figures 3-5
  and 3-6). The size of this header varies according to the number of
  program instructions that need to be relocated at load time, but it is
  always a multiple of 512 bytes.

  Before MS-DOS transfers control to the program, the initial values of the
  code segment (CS) register and instruction pointer (IP) register are
  calculated from the entry-point information in the .EXE file header and
  the program's load address. This information derives from an END statement
  in the source code for one of the program's modules. The data segment (DS)
  and extra segment (ES) registers are made to point to the PSP so that the
  program can access the environment-block pointer, command tail, and other
  useful information contained there.

     SS:SP ++
           |                                                        |
           |                     Stack segment:                     |
           |        stack grows downward from top of segment        |
           |                           |                            |
           |                                                        |
  SS:0000H ++
           |                      Data segment                      |
           ++
           |                      Program code                      |
  CS:0000H ++
           |                 Program segment prefix                 |
  DS:0000H ++
  ES:0000H

  Figure 3-4.  A memory image of a typical .EXE-type program immediately
  after loading. The contents of the .EXE file are relocated and brought
  into memory above the program segment prefix. Code, data, and stack reside
  in separate segments and need not be in the order shown here. The entry
  point can be anywhere in the code segment and is specified by the END
  statement in the main module of the program. When the program receives
  control, the DS (data segment) and ES (extra segment) registers point to
  the program segment prefix; the program usually saves this value and then
  resets the DS and ES registers to point to its data area.

  The initial contents of the stack segment (SS) and stack pointer (SP)
  registers come from the header. This information derives from the
  declaration of a segment with the attribute STACK somewhere in the
  program's source code. The memory space allocated for the stack may be
  initialized or uninitialized, depending on the stack-segment definition;
  many programmers like to initialize the stack memory with a recognizable
  data pattern so that they can inspect memory dumps and determine how much
  stack space is actually used by the program.

  When a .EXE program finishes processing, it should return control to
  MS-DOS through Int 21H Function 4CH. Other methods are available, but
  they offer no advantages and are considerably less convenient (because
  they usually require the CS register to point to the PSP).

  Byte
  offset
  0000H ++
        |           First of .EXE file signature (4DH)           |
  0001H ++
        |        Second part of .EXE file signature (5AH)        |
  0002H ++
        |                 Length of file MOD 512                 |
  0004H ++
        |    Size of file in 512-byte pages, including header    |
  0006H ++
        |            Number of relocation-table items            |
  0008H ++
        |      Size of header in paragraphs (16-byte units)      |
  000AH ++
        |   Minimum number of paragraphs needed above program    |
  000CH ++
        |   Maximum number of paragraphs desired above program   |
  000EH ++
        |          Segment displacement of stack module          |
  0010H ++
        |            Contents of SP register at entry            |
  0012H ++
        |                     Word checksum                      |
  0014H ++
        |            Contents of IP register at entry            |
  0016H ++
        |          Segment displacement of code module           |
  0018H ++
        |        Offset of first relocation item in file         |
  001AH ++
        |    Overlay number (0 for resident part of program)     |
  001BH ++
        |                Variable reserved space                 |
        ++
        |                    Relocation table                    |
        ++
        |                Variable reserved space                 |
        ++
        |               Program and data segments                |
        ++
        |                     Stack segment                      |
        ++

  Figure 3-5.  The format of a .EXE load module.

  The input to the linker for a .EXE-type program can be many separate
  object modules. Each module can use a unique code-segment name, and the
  procedures can carry either the NEAR or the FAR attribute, depending on
  naming conventions and the size of the executable code. The programmer
  must take care that the modules linked together contain only one segment
  with the STACK attribute and only one entry point defined with an END
  assembler directive. The output from the linker is a file with a .EXE
  extension. This file can be executed immediately.

  
  C>DUMP HELLO.EXE
         0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
  0000  4D 5A 28 00 02 00 01 00 20 00 09 00 FF FF 03 00  MZ(..... .......
  0010  80 00 20 05 00 00 00 00 1E 00 00 00 01 00 01 00  .. .............
  0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        .
        .
        .
  0200  B8 01 00 8E D8 B4 40 BB 01 00 B9 10 00 90 BA 08  ...............
  0210  00 CD 21 B8 00 4C CD 21 0D 0A 48 65 6C 6C 6F 20  ..!..L.!..Hello
  0220  57 6F 72 6C 64 21 0D 0A                          World!..
  

  Figure 3-6.  A hex dump of the HELLO.EXE program, demonstrating the
  contents of a simple .EXE load module. Note the following interesting
  values: the .EXE signature in bytes 0000H and 0001H, the number of
  relocation-table items in bytes 0006H and 0007H, the minimum extra memory
  allocation (MIN_ALLOC) in bytes 000AH and 000BH, the maximum extra memory
  allocation (MAX_ALLOC) in bytes 000CH and 000DH, and the initial IP
  (instruction pointer) register value in bytes 0014H and 0015H. See also
  Figure 3-5.

An Example .EXE Program

  The HELLO.EXE program in Figure 3-7 demonstrates the fundamental
  structure of an assembly-language program that is destined to become a
  .EXE file. At minimum, it should have a module name, a code segment, a
  stack segment, and a primary procedure that receives control of the
  computer from MS-DOS after the program is loaded. The HELLO.EXE program
  also contains a data segment to provide a more complete example.

  The NAME, TITLE, and PAGE directives were covered in the HELLO.COM example
  program and are used in the same manner here, so we'll move to the first
  new item of interest. After a few comments and EQU statements, we come to
  a declaration of a code segment that begins on line 21 with a SEGMENT
  command and ends on line 41 with an ENDS command. As in the HELLO.COM
  example program, the label in the leftmost field of the line gives the
  code segment the name _TEXT. The operand fields at the right end of the
  line give the attributes WORD, PUBLIC, and `CODE'.

  Following the code-segment instruction, we find an ASSUME statement on
  line 23. Notice that, unlike the equivalent statement in the HELLO.COM
  program, the ASSUME statement in this program specifies several different
  segment names. Again, remember that this statement has no direct effect on
  the contents of the segment registers but affects only the operation of
  the assembler itself.

  
   1:          name    hello
   2:          page    55,132
   3:          title   HELLO.EXE--print Hello on terminal
   4:  ;
   5:  ; HELLO.EXE:    demonstrates various components
   6:  ;               of a functional .EXE-type assembly-
   7:  ;               language program, use of segments,
   8:  ;               and an MS-DOS function call.
   9:  ;
  10:  ; Ray Duncan, May 1988
  11:  ;
  12:
  13:  stdin   equ     0               ; standard input handle
  14:  stdout  equ     1               ; standard output handle
  15:  stderr  equ     2               ; standard error handle
  16:
  17:  cr      equ     0dh             ; ASCII carriage return
  18:  lf      equ     0ah             ; ASCII linefeed
  19:
  20:
  21:  _TEXT   segment word public 'CODE'
  22:
  23:          assume  cs:_TEXT,ds:_DATA,ss:STACK
  24:
  25:  print   proc    far             ; entry point from MS-DOS
  26:
  27:          mov     ax,_DATA        ; make our data segment
  28:          mov     ds,ax           ; addressable...
  29:
  30:          mov     ah,40h          ; function 40h = write
  31:          mov     bx,stdout       ; standard output handle
  32:          mov     cx,msg_len      ; length of message
  33:          mov     dx,offset msg   ; address of message
  34:          int     21h             ; transfer to MS-DOS
  35:
  36:          mov     ax,4c00h        ; exit, return code = 0
  37:          int     21h             ; transfer to MS-DOS
  38:
  39:  print   endp
  40:
  41:  _TEXT   ends
  42:
  43:
  44:  _DATA   segment word public 'DATA'
  45:
  46:  msg     db      cr,lf           ; message to display
  47:          db      'Hello World!',cr,lf
  48:
  49:  msg_len equ     $-msg           ; length of message
  50:
  51:  _DATA   ends
  52:
  53:
  54:  STACK   segment para stack `STACK'
  55:
  56:          db      128 dup (?)
  57:
  58:  STACK   ends
  59:
  60:          end     print           ; defines entry point
  

  Figure 3-7.  The HELLO.EXE program listing.

  Within the code segment, the main print procedure is declared by the PROC
  command on line 25 and closed with ENDP on line 39. Because the procedure
  resides in a .EXE file, we have given it the FAR attribute as an example,
  but the attribute is really irrelevant because the program is so small and
  the procedure is not called by anything else in the same program.

  The print procedure first initializes the DS register, as indicated in the
  earlier ASSUME statement, loading it with a value that causes it to point
  to the base of the data area. (MS-DOS automatically sets up the CS and SS
  registers.) Next, the procedure uses MS-DOS Int 21H Function 40H to
  display the message Hello World! on the screen, just as in the HELLO.COM
  program. Finally, the procedure exits back to MS-DOS with an Int 21H
  Function 4CH on lines 36 and 37, passing a return code of zero (which by
  convention means a success).

  Lines 44 through 51 declare a data segment named _DATA, which contains the
  variables and constants the program will use. If the various modules of a
  program contain multiple data segments with the same name, the linker will
  collect them and place them in the same physical memory segment.

  Lines 54 through 58 establish a stack segment; PUSH and POP instructions
  will access this area of scratch memory. Before MS-DOS transfers control
  to a .EXE program, it sets up the SS and SP registers according to the
  declared size and location of the stack segment. Be sure to allow enough
  room for the maximum stack depth that can occur at runtime, plus a safe
  number of extra words for registers pushed onto the stack during an MS-DOS
  service call. If the stack overflows, it may damage your other code and
  data segments and cause your program to behave strangely or even to crash
  altogether!

  The END statement on line 60 winds up our brief HELLO.EXE program, telling
  the assembler that it has reached the end of the source file and providing
  the label of the program's point of entry from MS-DOS.

  The differences between .COM and .EXE programs are summarized in Figure
  3-8.


                     .COM program               .EXE program
  
  Maximum size       65,536 bytes minus 256     No limit
                     bytes for PSP and 2 bytes
                     for stack

  Entry point        PSP:0100H                  Defined by END statement

  AL at entry        00H if default FCB #1 has  Same
                     valid drive, 0FFH if
                     invalid drive

  AH at entry        00H if default FCB #2 has  Same
                     valid drive, 0FFH if
                     invalid drive

  CS at entry        PSP                        Segment containing module
                                                with entry point

  IP at entry        0100H                      Offset of entry point within
                                                its segment

  DS at entry        PSP                        PSP

  ES at entry        PSP                        PSP

  SS at entry        PSP                        Segment with STACK attribute

  SP at entry        0FFFEH or top word in      Size of segment defined with
                     available memory,          STACK attribute
                     whichever is lower

  Stack at entry     Zero word                  Initialized or uninitialized

  Stack size         65,536 bytes minus 256     Defined in segment with
                     bytes for PSP and size of  STACK attribute
                     executable code and data

  Subroutine calls   Usually NEAR               NEAR or FAR

  Exit method        Int 21H Function 4CH      Int 21H Function 4CH
                     preferred, NEAR RET if     preferred
                     MS-DOS version 1

  Size of file       Exact size of program      Size of program plus header
                                                (multiple of 512 bytes)
  


  Figure 3-8.  Summary of the differences between .COM and .EXE programs,
  including their entry conditions.


More About Assembly-Language Programs

  Now that we've looked at working examples of .COM and .EXE
  assembly-language programs, let's backtrack and discuss their elements a
  little more formally. The following discussion is based on the Microsoft
  Macro Assembler, hereafter referred to as MASM. If you are familiar with
  MASM and are an experienced assembly-language programmer, you may want to
  skip this section.

  MASM programs can be thought of as having three structural levels:

    The module level

    The segment level

    The procedure level

  Modules are simply chunks of source code that can be independently
  maintained and assembled. Segments are physical groupings of like items
  (machine code or data) within a program and a corresponding segregation of
  dissimilar items. Procedures are functional subdivisions of an executable
  programroutines that carry out a particular task.

Program Modules

  Under MS-DOS, the module-level structure consists of files containing the
  source code for individual routines. Each source file is translated by the
  assembler into a relocatable object module. An object module can reside
  alone in an individual file or with many other object modules in an
  object-module library of frequently used or related routines. The
  Microsoft Object Linker (LINK) combines object-module files, often with
  additional object modules extracted from libraries, into an executable
  program file.

  Using modules and object-module libraries reduces the size of your
  application source files (and vastly increases your productivity), because
  these files need not contain the source code for routines they have in
  common with other programs. This technique also allows you to maintain the
  routines more easily, because you need to alter only one copy of their
  source code stored in one place, instead of many copies stored in
  different applications. When you improve (or fix) one of these routines,
  you can simply reassemble it, put its object module back into the library,
  relink all of the programs that use the routine, and voilga: instant
  upgrade.

Program Segments

  The term segments refers to two discrete programming concepts: physical
  segments and logical segments.

  Physical segments are 64 KB blocks of memory. The Intel 8086/8088 and
  80286 microprocessors have four segment registers, which are essentially
  used as pointers to these blocks. (The 80386 has six segment registers,
  which are a superset of those found on the 8086/8088 and 80286.) Each
  segment register can point to the bottom of a different 64 KB area of
  memory. Thus, a program can address any location in memory by appropriate
  manipulation of the segment registers, but the maximum amount of memory
  that it can address simultaneously is 256 KB.

  As we discussed earlier in the chapter, .COM programs assume that all four
  segment registers always point to the same placethe bottom of the
  program. Thus, they are limited to a maximum size of 64 KB. .EXE programs,
  on the other hand, can address many different physical segments and can
  reset the segment registers to point to each segment as it is needed.
  Consequently, the only practical limit on the size of a .EXE program is
  the amount of available memory. The example programs throughout the
  remainder of this book focus on .EXE programs.

  Logical segments are the program components. A minimum of three logical
  segments must be declared in any .EXE program: a code segment, a data
  segment, and a stack segment. Programs with more than 64 KB of code or
  data have more than one code or data segment. The routines or data that
  are used most frequently are put into the primary code and data segments
  for speed, and routines or data that are used less frequently are put into
  secondary code and data segments.

  Segments are declared with the SEGMENT and ENDS directives in the
  following form:

  name   SEGMENT attributes
  .
  .
  .
  name   ENDS

  The attributes of a segment include its align type (BYTE, WORD, or PARA),
  combine type (PUBLIC, PRIVATE, COMMON, or STACK), and class type. The
  segment attributes are used by the linker when it is combining logical
  segments to create the physical segments of an executable program. Most of
  the time, you can get by just fine using a small selection of attributes
  in a rather stereotypical way. However, if you want to use the full range
  of attributes, you might want to read the detailed explanation in the MASM
  manual.

  Programs are classified into one memory model or another based on the
  number of their code and data segments. The most commonly used memory
  model for assembly-language programs is the small model, which has one
  code and one data segment, but you can also use the medium, compact, and
  large models (Figure 3-9). (Two additional models exist with which we
  will not be concerning ourselves further: the tiny model, which consists
  of intermixed code and data in a single segment for example, a .COM file
  under MS-DOS; and the huge model, which is supported by the Microsoft C
  Optimizing Compiler and which allows use of data structures larger than 64
  KB.)

  Model                    Code segments           Data segments
  
  Small                    One                     One
  Medium                   Multiple                One
  Compact                  One                     Multiple
  Large                    Multiple                Multiple
  

  Figure 3-9.  Memory models commonly used in assembly-language and C
  programs.

  For each memory model, Microsoft has established certain segment and class
  names that are used by all its high-level-language compilers (Figure
  3-10). Because segment names are arbitrary, you may as well adopt the
  Microsoft conventions. Their use will make it easier for you to integrate
  your assembly-language routines into programs written in languages such as
  C, or to use routines from high-level-language libraries in your
  assembly-language programs.

  Another important Microsoft high-level-language convention is to use the
  GROUP directive to name the near data segment (the segment the program
  expects to address with offsets from the DS register) and the stack
  segment as members of DGROUP (the automatic data group), a special name
  recognized by the linker and also by the program loaders in Microsoft
  Windows and Microsoft OS/2. The GROUP directive causes logical segments
  with different names to be combined into a single physical segment so that
  they can be addressed using the same segment base address. In C programs,
  DGROUP also contains the local heap, which is used by the C runtime
  library for dynamic allocation of small amounts of memory.


  Memory      Segment      Align       Combine     Class        Group
  model       name         type        type        type
  
  Small       _TEXT        WORD        PUBLIC      CODE
              _DATA        WORD        PUBLIC      DATA         DGROUP
              STACK        PARA        STACK       STACK        DGROUP

  Medium      module_TEXT  WORD        PUBLIC      CODE
              .            WORD        PUBLIC      DATA         DGROUP
              .
              .
              _DATA
              STACK        PARA        STACK       STACK        DGROUP

  Compact     _TEXT        WORD        PUBLIC      CODE
              data         PARA        PRIVATE     FAR_DATA
              .            WORD        PUBLIC      DATA         DGROUP
              .
              .
              _DATA
              STACK        PARA        STACK       STACK        DGROUP

  Large       module_TEXT  WORD        PUBLIC      CODE
              .
              .
              .
              data         PARA        PRIVATE     FAR_DATA
              .
              .
              .
              _DATA        WORD        PUBLIC      DATA         DGROUP
              STACK        PARA        STACK       STACK        DGROUP
  


  Figure 3-10.  Segments, groups, and classes for the standard memory models
  as used with assembly-language programs. The Microsoft C Optimizing
  Compiler and other high-level-language compilers use a superset of these
  segments and classes.

  For pure assembly-language programs that will run under MS-DOS, you can
  ignore DGROUP. However, if you plan to integrate assembly-language
  routines and programs written in high-level languages, you'll want to
  follow the Microsoft DGROUP convention. For example, if you are planning
  to link routines from a C library into an assembly-language program, you
  should include the line

  DGROUP group _DATA,STACK

  near the beginning of the program.

  The final Microsoft convention of interest in creating .EXE programs is
  segment order. The high-level compilers assume that code segments always
  come first, followed by far data segments, followed by the near data
  segment, with the stack and heap last. This order won't concern you much
  until you begin integrating assembly-language code with routines from
  high-level-language libraries, but it is easiest to learn to use the
  convention right from the start.

Program Procedures

  The procedure level of program structure is partly real and partly
  conceptual. Procedures are basically just a fancy guise for subroutines.

  Procedures within a program are declared with the PROC and ENDP directives
  in the following form:

  name   PROC attribute
  .
  .
  .
         RET
  name   ENDP

  The attribute carried by a PROC declaration, which is either NEAR or FAR,
  tells the assembler what type of call you expect to use to enter the
  procedurethat is, whether the procedure will be called from other
  routines in the same segment or from routines in other segments. When the
  assembler encounters a RET instruction within the procedure, it uses the
  attribute information to generate the correct opcode for either a near
  (intra-segment) or far (inter-segment) return.

  Each program should have a main procedure that receives control from
  MS-DOS. You specify the entry point for the program by including the name
  of the main procedure in the END statement in one of the program's source
  files. The main procedure's attribute (NEAR or FAR) is really not too
  important, because the program returns control to MS-DOS with a function
  call rather than a RET instruction. However, by convention, most
  programmers assign the main procedure the FAR attribute anyway.

  You should break the remainder of the program into procedures in an
  orderly way, with each procedure performing a well-defined single
  function, returning its results to its caller, and avoiding actions that
  have global effects within the program. Ideally procedures invoke each
  other only by CALL instructions, have only one entry point and one exit
  point, and always exit by means of a RET instruction, never by jumping to
  some other location within the program.

  For ease of understanding and maintenance, a procedure should not exceed
  one page (about 60 lines); if it is longer than a page, it is probably too
  complex and you should delegate some of its function to one or more
  subsidiary procedures. You should preface the source code for each
  procedure with a detailed comment that states the procedure's calling
  sequence, results returned, registers affected, and any data items
  accessed or modified. The effort invested in making your procedures
  compact, clean, flexible, and well-documented will be repaid many times
  over when you reuse the procedures in other programs.



